Record · Rewind · Edit · Resume — without re-running anything.
📖 Docs · 🚀 Examples · 🛡️ Sentinel · 📊 Benchmarks
pip install ai-agent-vcr
No API keys. No cloud. Runs entirely locally.
|
❌ Without Agent VCR |
✅ With Agent VCR player = VCRPlayer.load("run.vcr")
# Jump to step 8, see what went wrong
state = player.goto_frame(7)
# Fix it and resume — skip steps 0-7
player.resume(agent, ResumeConfig(
from_frame=7,
state_overrides={"prompt": "fixed"}
)) |
|
Jump to any step. Full state snapshot at every node. Inspect input, output, diffs. |
Fix a prompt, patch a tool output, inject context — then resume from that point. No re-runs. |
Fork from any frame. Create parallel runs. Compare how fixes change downstream behavior. |
|
Save successful runs. Replay the same task instantly — zero tokens, zero cost, 100% savings. |
|
Real-time AST analysis catches duplicate functions, complexity spikes, and makes the agent self-correct. |
|
|
|
P99 under 5ms. Benchmarked in CI on every commit. Safe for production. |
from agent_vcr import VCRRecorder
recorder = VCRRecorder()
recorder.start_session("my_run")
# Your existing agent code — unchanged
state = {"query": "build a REST API"}
state = planner(state) # step 1
recorder.record_step("planner", input_state, state)
state = coder(state) # step 2
recorder.record_step("coder", input_state, state)
recorder.save() # → .vcr/my_run.vcrOr use the context manager — never lose frames even if the agent crashes:
with VCRRecorder() as recorder:
recorder.start_session("my_run")
# ... your agent code ...
# auto-saved on exitfrom agent_vcr import VCRPlayer
from agent_vcr.models import ResumeConfig
player = VCRPlayer.load(".vcr/my_run.vcr")
# Inspect any step
print(player.goto_frame(0)) # {'query': 'build a REST API', ...}
print(player.goto_frame(1)) # {'plan': '...', 'steps': [...], ...}
print(player.get_errors()) # see what failed
# Diff two frames
diff = player.compare_frames(0, 1)
# {'added': {'plan': ...}, 'modified': {'query': ...}, ...}
# Fix and resume from step 1 with a different plan
player.resume(
agent_callable=coder,
config=ResumeConfig(
from_frame=1,
state_overrides={"plan": "use FastAPI instead of Flask"}
)
)from langgraph.graph import StateGraph
from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import VCRLangGraph
graph = StateGraph(MyState)
graph.add_node("planner", planner_node)
graph.add_node("coder", coder_node)
graph.add_edge("planner", "coder")
recorder = VCRRecorder()
graph = VCRLangGraph(recorder).wrap_graph(graph) # one line
result = graph.invoke({"query": "Build a todo app"})
recorder.save()from crewai import Crew
from agent_vcr import VCRRecorder
from agent_vcr.integrations.crewai import VCRCrewAI
recorder = VCRRecorder()
recorder.start_session("crew_run")
crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
result = VCRCrewAI(recorder).kickoff(crew)
recorder.save()Install extras:
pip install "ai-agent-vcr[crewai]"
pip install "ai-agent-vcr[langgraph]"from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import vcr_record
recorder = VCRRecorder()
@vcr_record(recorder, node_name="research_step")
def research(state: dict) -> dict:
return {"findings": search(state["query"])}Databases solved the partial-failure problem 40 years ago. Agents have the exact same problem — when your agent fails mid-run, you don't just have bad in-memory state. You have files written to disk that shouldn't exist.
Current tools only roll back state objects. The filesystem stays polluted.
Agent VCR wraps agent execution in real transactional semantics:
from agent_vcr import VCRRecorder
from agent_vcr.integrations.openhands import ACIDWorkspace
recorder = VCRRecorder()
acid = ACIDWorkspace("/my/workspace", recorder=recorder)
acid.begin(session_id="task-001") # isolated git branch
acid.savepoint(state, node_name="coder") # checkpoint state + filesystem
acid.savepoint(state, node_name="tester")
# Agent writes bad code at step 4 — rollback
acid.rollback(to_frame_index=1)
# git reset --hard → bad files are GONE from disk, not just hidden
acid.commit() # merge clean branch into main- BEGIN → isolated git branch per agent session. Parallel agents can't clobber each other.
- SAVEPOINT → checkpoints both VCR state AND filesystem. Every frame has a matching git commit.
- ROLLBACK →
git reset --hard. Files your agent hallucinated are physically deleted. - COMMIT → clean merge back into main.
python examples/acid_golden_run.pyWhen your agent succeeds, save the entire execution as a replayable ghost run. Next time you hit the same task, replay it instantly — zero LLM calls, zero tokens, zero cost.
from agent_vcr.golden_cache import GoldenRunCache
cache = GoldenRunCache()
# After a successful run:
cache.save_golden_run("Build a REST API with JWT auth", recorder)
# Next time — instant, $0.00:
outputs, ledger = cache.replay("Build a REST API with JWT auth")
print(ledger)
# CostLedger(saved=100% | $0.0123 | 4,100 tokens | 2,349ms)The CostLedger tracks original vs replay: tokens, dollars, milliseconds, and reduction percentage. The demo shows it live:
python examples/acid_golden_run.pyRUN 1: Original RUN 2: Ghost Replay
Tokens: 4,100 Tokens: 0
Cost: $0.0123 Cost: $0.00
Latency: 2,350ms Latency: 1ms
💰 Savings: 100% · $0.0123 · 4,100 tokens · 2,349ms
Run the terminal debugger on any recorded session:
vcr-tui .vcr/my_run.vcr┌──────────────────────────────────────────────────────────┐
│ 📼 Agent VCR TUI Session: my_run · 8 frames │
├──────────────────────────────────────────────────────────┤
│ ▶ Frame 0 │ planner │ 100ms │ ● │
│ Frame 1 │ researcher │ 250ms │ ● │
│ Frame 2 │ coder │ 480ms │ ✗ ERROR │
│ Frame 3 │ tester │ 80ms │ ● │
├──────────────────────────────────────────────────────────┤
│ State at frame 0: │
│ { "query": "build a todo app", │
│ "context": "...", │
│ "plan": null } │
├──────────────────────────────────────────────────────────┤
│ ← → navigate │ e edit │ d diff │ r resume │ q quit │
└──────────────────────────────────────────────────────────┘
Keybindings:
←→— navigate framese— edit state inline (opens editor, saves on exit)d— diff current frame vs previousr— resume from current framef— fork current frame to new sessionq— quit
See your agent's full execution graph — forks, parallel branches, error paths:
vcr-server .vcr/
# Open localhost:8000The dashboard renders your session as a DAG:
original_run ────────────────────────────────────────────► [done]
│ frame 3
╰──► fork_v1 ──► [coder] ──► [tester] ──► [done]
│
╰──► fork_v2 ──► [coder] ──► [done]
- Every fork is a branch node
- Error frames shown in red
- Click any node to inspect full state
- Live WebSocket streaming for in-progress sessions
"Code is cheap now. Good code is not." — Graham Neubig, OpenHands Chief Scientist
Sentinel watches every file an AI agent writes and catches quality violations in real time — before the agent moves on.
from openhands_sentinel import Sentinel
from agent_vcr import VCRRecorder
recorder = VCRRecorder()
sentinel = Sentinel(recorder=recorder)
sentinel.attach(runtime.event_stream) # 3 lines, auto-intercepts every file writepython examples/sentinel_demo.pySTEP 1: Agent writes auth/utils.py
🛡️ SENTINEL: auth/utils.py — CLEAN ✓
STEP 2: Agent writes handlers.py
🛡️ SENTINEL: VIOLATIONS DETECTED!
CRITICAL hash_password() already exists in auth/utils.py:8 — reuse it
CRITICAL handle_auth_request() is 109 lines (max 40) — break it up
CRITICAL Cyclomatic complexity 32 (max 8) — simplify
WARNING 9 parameters (max 5) — use a config object
STEP 3: Agent self-corrects
🛡️ SENTINEL: handlers.py — CLEAN ✓ All issues resolved!
📼 Audit trail: .vcr/sentinel-demo.vcr
Or scan any directory standalone:
sentinel scan ./my-ai-project| Without Sentinel | With Sentinel |
|---|---|
| Agent writes bad code | Agent writes bad code |
| Human reviews PR | Sentinel catches in <10ms |
| Human rejects PR | Agent self-corrects |
| Agent rewrites | (already done) |
| Human reviews again | Zero human time |
| Cost: 2× LLM + human hours | Cost: 1 extra LLM call |
| Feature | 📼 Agent VCR | LangSmith | LangFuse | Arize Phoenix |
|---|---|---|---|---|
| Record traces | ✅ | ✅ | ✅ | ✅ |
| Time-travel to any step | ✅ | ❌ | ❌ | ❌ |
| Edit state & resume | ✅ | ❌ | ❌ | ❌ |
| Fork from any frame | ✅ | ❌ | ❌ | ❌ |
| ACID filesystem rollback | ✅ | ❌ | ❌ | ❌ |
| Ghost Replay (zero-cost) | ✅ | ❌ | ❌ | ❌ |
| Code quality guardian | ✅ Sentinel | ❌ | ❌ | ❌ |
| TUI debugger | ✅ | ❌ | ❌ | ❌ |
| Self-hosted / local | ✅ | ❌ Cloud | ✅ | ✅ |
| Setup | 3 lines | ~15 | ~10 | ~10 |
recorder = VCRRecorder(
output_dir=".vcr", # where to save sessions
auto_save=True, # flush frames to disk as you go
diff_mode=False, # also store state diffs (jsonpatch)
)
recorder.start_session(session_id="my_run", tags=["prod"])
recorder.record_step(node_name, input_state, output_state, metadata)
recorder.record_llm_call(node_name, prompt, response, tokens, cost_usd)
recorder.record_tool_call(node_name, tool_name, args, result)
recorder.record_error(node_name, input_state, error)
recorder.save() -> Path
recorder.fork(from_frame=3) -> VCRRecorder # branch from a frame
# Context manager — auto-saves on exit
with VCRRecorder() as r:
r.start_session("run")
...player = VCRPlayer.load(".vcr/my_run.vcr")
player = VCRPlayer.load_by_id("my_run")
player.goto_frame(index) # → dict (output state at frame N)
player.get_frame(index) # → Frame object
player.get_input_state(index) # → dict (input state at frame N)
player.list_nodes() # → ['planner', 'coder', ...]
player.get_errors() # → [Frame, ...]
player.compare_frames(a, b) # → {'added': {}, 'removed': {}, 'modified': {}}
player.get_total_latency() # → float (ms)
player.get_total_tokens() # → int
player.get_total_cost() # → float (USD)
player.resume(
agent_callable, # your agent function
config=ResumeConfig(
from_frame=7, # rewind to BEFORE step 7 ran
state_overrides={"k": "v"},# apply these before re-running
mode=ResumeMode.FORK, # FORK | REPLAY | MOCK
)
) -> str # new session IDacid = ACIDWorkspace("/workspace", recorder=recorder)
acid.begin(session_id="task-001")
acid.savepoint(state, node_name="coder")
acid.rollback(to_frame_index=2) # git reset --hard
acid.commit() # merge to mainfrom agent_vcr.golden_cache import GoldenRunCache
cache = GoldenRunCache(cache_dir=".vcr/golden")
cache.save_golden_run(task_description, recorder) -> str # fingerprint
cache.replay(task_description) -> (outputs, CostLedger)
cache.invalidate(task_description) -> bool
cache.list_runs() -> list[dict]# Basic recording and playback
python examples/basic_usage.py
# Time-travel: rewind, edit state, resume (with assertion)
python examples/time_travel_demo.py
# LangGraph auto-instrumentation
python examples/langgraph_integration.py
# ACID transactions + Ghost Replay (most impressive demo)
python examples/acid_golden_run.py
# OpenHands Sentinel: agent self-correction live
python examples/sentinel_demo.py
# Async recording
python examples/async_example.pySessions are plain JSONL — one JSON object per line:
{"type": "session", "data": {"session_id": "my_run", "created_at": "2024-01-01T00:00:00Z", ...}}
{"type": "frame", "data": {"node_name": "planner", "input_state": {...}, "output_state": {...}, "metadata": {"latency_ms": 120}}}
{"type": "frame", "data": {"node_name": "coder", ...}}- Human-readable — open in any text editor
- Git-diffable — review agent state changes in PRs
- Append-only — no rewrites, safe for concurrent agents
- Streamable — parse line-by-line, no full-file load required
Recording overhead is benchmarked in CI on every commit and must stay under 5ms P99.
pytest tests/benchmarks/ -v --benchmark-onlyResults are published at ixchio.github.io/agent-vcr/dev/bench/.
- Core recording and playback
- Time-travel resume with state injection
- FastAPI server with live WebSocket streaming
- LangGraph integration
- CrewAI integration
- Async recorder and player
- Terminal TUI debugger (
vcr-tui) - React dashboard with DAG visualization
- ACID Transactions (git-backed filesystem rollback)
- Ghost Replay (zero-cost replay of successful runs)
- 🛡️ OpenHands Sentinel (real-time code quality guardian)
- Context manager (
with VCRRecorder() as r:) - AutoGen integration
- Cloud storage backend (S3, GCS)
- Collaborative debugging (share sessions)
- Replay regression tests (run golden paths as CI assertions)
git clone https://github.com/ixchio/agent-vcr.git
cd agent-vcr
pip install -e ".[dev,tui]"
pytest tests/unit/ -vSee CONTRIBUTING.md for guidelines.
MIT — see LICENSE.
LangSmith shows you what happened. Agent VCR lets you change it.
pip install ai-agent-vcr
⭐ Star on GitHub · 📦 PyPI · 📖 Docs
Built with 🤍 by ixchio · MIT License
